Real-Time Data Harvesting Method for Czech Twitter
نویسندگان
چکیده
This paper deals with automatic analysis of Czech social media. The main goal is to propose an approach to harvest interesting messages from Twitter in Czech language with high download speed. This method uses user lists to discover potentially interesting tweets to download. It is motivated by the fact that only about 20% of Twitter users are posting informative messages, whereas the remaining 80% not and that it is possible to identify the “important” users by the user lists. The experimental results show that the proposed method is very efficient because it harvests about 6 times more data than the other approaches. This approach should be integrated into an experimental system for the Czech News Agency to monitor the current data-flow on Twitter, download messages in real-time, analyze them and extract relevant events.
منابع مشابه
Design and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملSarcasm Detection on Czech and English Twitter
This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manu...
متن کاملA framework for real-time Twitter data analysis
Twitter is a popular social network which allows millions of users to share their opinions on what happens all over the world. In this work we present a system for real-time Twitter data analysis in order to follow popular events from the user’s perspective. The method we propose extends and improves the Soft Frequent Pattern Mining (SFPM) algorithm by overcoming its limitations in dealing with...
متن کاملGetting There First: Real-Time Detection of Real-World Incidents on Twitter
Social networking and micro-blogging services such as Twitter have become a valuable source of information on current events. Widespread use of Twitter on mobile devices and personal computers enables users to share short messages on any subject in real-time, thus making it suitable for early detection of unexpected events where fast response is critical. In this paper, we present an online met...
متن کاملMethod for Measuring Twitter Content Influence
Twitter is a microblogging website with specific characteristics not found in other social network services. This platform contains a good deal of valuable content, and users can access this content using Twitter search. However, Twitter search returns only time-descending ordered content including keywords. Thus, we propose a linear-time method of measuring the influence of Twitter content con...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017